G 


An-A9.gR  nift  'rrAT,0N  PAGE 

form  Approww 
om  No.  0704-011* 

mmm  i  mi 

at  OATt  j.  *trc*T  v/Pt  amo  o*ni  covimo 

_ FINAL  01  Aug  88  to  01  .Tan  Q 1 _ 

a.  mu  amo  suanru 

PARALLEL  ALGORITHMS  FOR  LEAST  SQAURES  AND  RELATED 
COMPUTATIONS 

3  njMOWM  NUMAIRS 

AFOSR-88-0285 

61102F  2304/A8 

«.  AUTHORS) 

ROBERT  J.  PLEMMONS 

7.  HJtKMMMO  OKOAMOAHOM  NAM«S)  ANO  AOORISSIIS) 

NORTH  CAROLINA  STATE  UNIVERSITY 

B0X  7003  -AEOSR-T; 

RALEIGH  NC  27695-7003 

*  HMOMAM  OROANOATION 
WORT  MU  MUR 

•  9  1  0  2  5 3 

1  SAOMSORMO  /  MON1TOIUN<3  AfilNCf  NAMI(S)  AMO  AOOMSKIS) 

APOSR/IW 

Bldg  *10 

Bolling  A7B  DC  20332-6*48 

10.  SRONSORMO/MONTTOMNO 

A6CMCV  RtRQNT  MUMUR 

AFOSR-88-0285 

11.  suapumintaxy  notis 

12«.  OtSTMUTIOM  /  AVACAMUTY  STATtMINT 

Arjrr0r.3<3  for  public  release  ;  * 

distribution  unlimited. 

12 A.  OtSTRMUTlON  COO« 

12l  ABSTRACT  (Msjamvm  JOOworOV 

Algorithms  for  signal  processing  and  image  processing  have  been 
developed  and  implemented  on  a  variety  of  architectures  including  a 
Cray  Y-MP,  an  Intel  iPSC/2,  a  Connection  Machine,  and  an  Alliant 
FX/40.  Research  has  also  been  conducted  on  interative  least  squares 
and  related  methods  with  applications  to  structural  analysis. 


16  060 

nnmm 

91  4 

11.  MKI  COO< 

17.  sicuxrfr  OAss*KAno« 
04  RCPOKT 

It.  SICURITY  CLASSIftCATIOM 
04  THIS  MSI 

19.  SICURITY  CLASSIFICATION 
04  ABSTRACT 

29.  URNTATION  Of  ABSTRACT 

UNLCASSIFIED 

UNCLASSIFED 

UNCLASSIFIED 

UL 

Hit*  73AV01-2NMM0  Standard  for*  299  («•*  2  M) 


FINAL  REPORT 


Grant.  No.:  AFOSR  -  88  -  0285 
Grant  Period  30  mos.:  8-1-88  to  1-31-91 

Parallel  Algorithms 
for 

Least  Squares 
and  Related  Computations 

*  V. 

Robert  J.  Plemmons 

Dept.  Mathematics  &  Computer  Science 
Wake  Forest  University,  Box  7388 
Winston-Salem,  NC  27109 

/ 


Plemmons 


Contents 


Page  Number 


Final  Report  -  Table  of  Contents 

1.  Project  Objectives .  1 

2.  Summaries  of  Major  Accomplishments .  4 

3.  Graduate  Students .  12 

4.  Technical  Publications .  12 

5.  Invited  Research  Presentations .  14 


AFOSR  Final  Report 


Plemmons 


Report  on  AFOSR-88-0285 


Page  - 1  - 


1 .  Project  Objectives 

This  represents  a  comprehensive  final  report  on  our  research  project,  which  is  concerned  with  the 
design  and  testing  of  new  algorithms  for  least  squares  computations,  with  particular  emphasis  on 
applications  to  signal  and  image  processing  and  to  computational  methods  in  structural  analysis. 
The  primary  objectives  of  the  project  were  to  mathematically  develop,  test,  and  analyze  fast 
numerical  algorithms  for  the  efficient  solution  to  computational  problems  on  modern  high 
performance  computers. 

Matrix  computations,  including  the  solution  of  systems  of  linear  equations,  least  squares  problems 
and  algebraic  eigenvalue  problems,  govern  the  performance  of  many  applications  on  modem  vector 
and  parallel  computers.  In  order  to  meet  some  of  the  challenges  of  this  emerging  new  generation 
of  machines,  the  goals  of  this  project  have  been  to  develop  techniques  in  matrix  computations 
for  efficient  implementation  on  advanced  computer  architectures.  Multiprocessing  technology  is 
still  evolving  and  is  predicted  to  be  the  dominant  methodology  in  the  computer  industry  by  the  next 
decade.  The  full  potential  of  parallel  computation  will  be  realized  in  signal  and  image  processing 
computations  only  when  parallel  architectures  are  combined  with  system  interfaces,  computational 
strategies,  and  numerical  algorithms  into  integrated  signal  environments.  Our  purpose  has  been  to 
address  each  of  these  topics.  The  research  described  here  is  expected  to  have  impacts  on  science 
and  engineering  as  part  of  a  continuing  development  of  the  computational  foundations  of 
technology.  As  in  other  fields,  scientific  computation  has  emerged  in  these  areas  as  a  third  major 
paradigm  beside  theory  and  experimentation. 

We  have  developed  and  implemented  algorithms  on  a  variety  of  architectures,  including:  a  Cray 
Y-MP  supercomputer  at  our  newly  operational  North  Carolina  Supercomputing  Center  in  the 
Research  Triangle  Park,  an  Intel  iPSC/2  distributed  memory  multiprocessor  at  the  Oak  Ridge 
National  Laboratory,  a  Connection  Machine  CM/2  at  the  Argonne  National  Laboratory,  and  our 
own  Alliant  FX/40  vector-multiprocessor.  The  Alliant  was  purchased  by  AFOSR  funds  under  the 
Defense  Department  DURIP  Program  in  1988. 

Applications  of  our  work  to  the  practical  real-world  problems  of  least  squares  estimation  methods 
in  signal  and  image  processing  and  to  computational  methods  in  structures  and  fluids  have  been 
made.  Our  main  objective  in  least  squares  computations  has  been  to  develop  fast  recursive 
algorithms  for  linear  prediction  of  one  and  two  dimensional  time  series  in  adaptive  signal 
processing:  identification,  estimation  and  control.  Our  schemes  are  amenable  to  implementation 
on  parallel  processing  systems,  especially  distributed  memory  architectures  such  as  the  Intel 
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iPSC/2  Hypercube.  This  work  in  developing  parallel  algorithms  for  recursive  least  squares  has 
produced  some  especially  important  recent  results  which  have  led  to  several  technical  research 
papers  involving  the  principal  investigator  and  his  students  during  the  reporting  period 

Very  recently,  the  principal  investigator,  along  with  co-researchers  W.  Femg,  G.  Golub  and  D. 
Pierce,  have  made  some  important  strides  in  developing  fast  adaptive  condition  estimation  schemes 
for  recursive  matrix  modifications.  Condition  estimation  is  important  is  assessing  the  reliability 
and  accuracy  of  linear  predictors  in  signal  processing  and  control,  and  has  important  potential 
applications  to  target  tracking.  Figure  4  in  Section  2  illustrates  applications  to  target  tracking. 

Applications  of  our  work  on  image  processing  algorithms  include  large  scale  computations  for 
full  3D  image  reconstruction  problems.  We  would  like  to  accelerate  these  computations  by  the  use 
of  parallel  multilevel  methods.  Defense  Department  contractors,  such  as  General  Electric,  would 
like  to  build  (and  are  building)  massively  parallel  computer  systems  for  these  applications.  Even 
with  1024  processors,  the  computations  may  be  too  slow  for  the  DOD  applications,  e.g.,  fast 
inspection  of  jet  engine  rotor  blades:  thus  the  need  for  a  multilevel  approach  in  a  parallel 
environment  to  accelerate  the  computations.  (For  instance,  the  B-l  Bomber  did  not  participate  in 
the  recent  Gulf  War,  in  part,  because  of  defects  in  the  engine  turbine  rotor  blades.)  One  of  my 
former  Ph.D.  students.  Air  Force  Major  Douglas  James,  is  involved  in  research  on  this  topic.  He 
is  currently  assigned  to  the  U.S.  Air  Force  Academy. 

We  have  also  worked  in  the  areas  of  iterative  least  squares  and  related  methods  with 
applications  to  structural  analysis.  Here  we  are  concerned  with  the  solution  of  the  fundamental 
problems  of  equilibrium:  for  instance  in  elastic  analysis  -  that  of  finding  the  stresses  and  strains 
and  solving  redesign  problems,  given  a  finite  element  model  of  a  large  complex  structure  and  a  set 
of  external  loads.  To  obtain  the  solution  of  this  constrained  minimization  problem,  a  variety  of 
algorithms  involving  the  displacement  method  or  the  force  method  can  be  applied.  Our  work  on 
this  topic  has  led  to  new  approaches  involving  least  squares  conjugate  gradient  iterative  nullspace 
methods  combined  with  substructuring.  Major  Douglas  James  has  written  his  Ph.D.  dissertation 
in  this  area,  and  several  technical  publications  have  been  written  on  this  aspect  of  our  project. 

In  addition  to  research  with  direct  applications  to  the  areas  of  recursive  least  squares  methods  in 
signal  and  image  processing,  and  to  computational  mechanics,  the  principal  investigator  has  been 
involved,  along  with  K.  Gallivan  and  A.  Sameh  at  the  University  of  Illinois,  with  the  development 
of  a  comprehensive  treatise  (publications  7,  9)  on  parallel  algorithms  for  dense  computations  in 
linear  algebra.  The  work  has  recently  been  published  in  a  general  reference  book  on  parallel 
algorithms  by  SIAM. 
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2.  Summaries  of  Major  Accomplishments 

Abstracts  of  the  most  significant  research  findings  are  given  in  this  section  of  the  report. 
Referenced  publications  can  be  found  in  Section  4.  We  include  a  cumulative  record  of  research 
summaries  of  papers  written  under  this  grant. 

•  Least  Squares  Modifications  with  Inverse  Factorizations:  Parallel  implications. 

The  process  of  modifying  least  squares  computations  by  updating  the  covariance  matrix  has  been 
used  in  control  and  signal  processing  for  some  time  in  the  context  of  linear  sequential  filtering. 
Here  we  give  an  new  approach  to  the  process  and  provide  extensions  to  downdating.  0  *r  purpose 
is  to  develop  algorithms  that  are  amenable  to  implementation  on  modem  multiprocessor 
architectures.  In  particular,  the  inverse  Cholesky  factor  R  1  is  considered  and  it  is  shown  that  the 
inverse  R'1  can  be  updated  (or  downdated)  by  applying  the  same  sequence  of  orthogonal 
(hyperbolic)  plane  rotations  that  are  used  to  update  (downdate)  R.  We  have  attempted  to  provide 
some  new  insights  into  least  squares  modification  processes  and  to  suggest  parallel  algorithms  for 
implementing  Kalman  type  filters  in  the  analysis  and  solution  of  estimation  problems  in  signal 
processing.  This  is  joint  work  with  former  Ph.D.  student  C.  Pan.  (See  publication  1.) 

•  Least  Squares  Multiple  Updating  Algorithms  on  a  Hypercube. 

Parallel  algorithms  for  multiple  updating  methods  in  recursive  least  squares  computations  are 
investigated.  Comparisons  of  updating  algorithms  by  carefully  implemented  orthogonal 
Householder  and  Givens  algorithms  are  made  on  the  iPSC  hypercube  distributed  memory 
multiprocessor  system.  Overall,  the  performance  of  updating  by  orthogonal  Householder 
reflections  using  a  row  oriented  storage  scheme  is  superior  to  those  of  Givens  rotations  using  a 
row  oriented  scheme  and  the  greedy  Givens  sequence  on  the  hypercube,  for  our  application.  In 
particular,  the  communication  complexity  is  independent  of  the  number  of  vectors  being  updated. 
The  methods  we  describe  can  also  be  adapted  to  the  parallel  computation  of  general  orthogonal 
factorizations  involved  in  least  squares  problems.  We  have  in  mind  applications  to  windowed 
recursive  least  squares  filtering  schemes  for  near  real-time  computations  on  distributed  memory 
architectures.  This  is  joint  work  with  D.  Agrawal  and  Computer  Engineering  graduate  student  S. 
Kim.  (See  publication  2.) 

•  Recursive  Least  Squares  Computations  on  a  Vector-Multiprocessor. 

We  consider  parallel  implementations  of  algorithms  for  recursive  least  squares  computations  based 
upon  the  information  matrix  and  the  covariance  matrix  updating  methods.  The  target  architecture  is 
a  shared-memory  multiprocessor,  and  test  results  on  an  Alliant  FX/40  system  with  vector- 
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multiprocessors  (purchased  through  an  AFOSR  DURIP  grant )  demonstrate  the  parallel  efficiencies 
of  the  algorithms.  The  results  also  show  that  the  covariance  method  in  a  form  suggested  by  Pan 
and  Plemmons  is  easily  the  most  efficient  on  the  Alliant  multiprocessor  computer.  This  is  an 
invited  paper  presented  at  the  International  Conference  on  the  Mathematics  of  Networks  and 
Systems,  Amsterdam,  June  1989.  (See  publication  3.) 

•  Optimality  Relationships  for  p-cyclic  SOR. 

The  optimality  question  for  block  p-cyclic  SOR  iterations  discusstid  in  the  classic  textbooks  by 
Young  and  Varga,  as  well  as  in  the  monograph  by  Berman  and  Plemmons,  is  answered  under 
natural  conditions  on  the  spectrum  of  the  block  Jacobi  matrix.  In  particular,  it  is  shown  that 
repartitioning  a  block  p-cyclic  matrix  into  a  block  q-cyclic  form,  q  <  p,  results  in  asymptotically 
faster  SOR  convergence  for  the  same  amount  of  work  per  iteration.  As  a  consequence  block  2  - 
cyclic  SOR  is  shown  to  be  optimal  under  these  conditions.  New  applications  of  this  work  include 
p-cyclic  iterative  methods  for  queuing  network  analysis  and  constrained  least  squares  computations 
arising  in  structural  analysis.  This  is  joint  work  with  A.  Hadjidimos  at  Purdue  and  D.  Pierce  at 
Boeing.  (See  publication  4.) 

•  Substructuring  Methods  for  Computing  the  Nullspace  of  Equilibrium  Matrices. 

Equations  of  equilibrium  arise  in  numerous  areas  of  engineering.  Applications  to  electrical  net¬ 
works,  structures  and  fluid  flow  are  elegantly  described  in  a  recent  book  on  applied  mathematics 
by  Strang.  The  context  in  which  equilibrium  equations  arise  may  be  stated  in  the  constrained 
minimization  form: 

min  (xtFx  -  xTr)  subject  to  Ex  =  s. 

Here  F  is  generally  some  symmetric  positive  definite  matrix  associated  with  the  minimization 
problem.  For  example,  F  is  the  element  flexibility  matrix  in  the  structures  application.  An  im¬ 
portant  approach  (called  the  force  method  in  structural  optimization)  to  the  solution  to  such  prob¬ 
lems  involves  dimension  reduction  nullspace  schemes  based  upon  computation  of  a  basis  for  the 
nullspace  for  E.  In  our  approach  to  solving  such  problems  we  emphasize  the  parallel  computation 
of  a  basis  for  the  nullspace  of  E  and  examine  the  applications  to  structural  optimization  and  fluid 
flow.  Several  new  block  decomposition  and  node  ordering  schemes  are  suggested  and  reanalysis 
computations  are  investigated.  The  following  figure  illustrates  our  novel  approach,  which 
combines  substructuring  with  nullspace  decompositions. 
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Figure  1.  Truss  Structure  Problem  Formulation. 

Comparisons  of  these  schemes  are  also  made  with  those  of  C.  Hall  at  the  University  of  Pittsburgh, 
and  others,  for  fluid  flow  computations.  This  is  joint  work  with  R.  White.  (See  publication  5.) 

•  An  Iterative  Substructuring  Algorithm  for  Equilibrium  Equations.  The  topic  of  it¬ 
erative  substructuring  methods,  and  more  generally  domain  decomposition  methods,  has  been  ex¬ 
tensively  studied  over  the  past  few  years,  and  the  topic  is  well  advanced  with  respect  to  first  and 
second  order  elliptic  problems.  However,  relatively  little  work  has  been  done  on  more  general 
constrained  least  squares  problems  (or  equivalent  formulations)  involving  equilibrium  equations 
such  as  those  arising,  for  example,  in  realistic  structural  analysis  applications.  The  potential  is 
good  for  effective  use  of  iterative  algorithms  on  these  problems,  but  such  methods  are  still  far  from 
being  competitive  with  direct  methods  in  industrial  codes.  The  purpose  of  this  paper  is  to  investi¬ 
gate  an  order  reducing,  preconditioned  conjugate  gradient  method  proposed  by  Barlow,  Nichols 
and  Plemmons  for  solving  problems  of  this  type.  The  relationships  between  this  method  and 
nullspace  methods,  such  as  the  force  method  for  structures  and  the  dual  variable  method  for  fluids, 
are  examined.  Convergence  properties  are  discussed  in  relation  to  recent  optimality  results  for 
Varga’s  theory  of  p-cyclic  SOR.  We  suggest  a  mixed  approach  for  solving  equilibrium  equations, 
consisting  of  both  direct  reduction  in  the  substructures  and  the  conjugate  gradient  iterative  algo¬ 
rithm  to  complete  the  computations.  Some  typical  problems  considered  in  our  numerical  tests  are 
indicated  in  the  next  figure. 
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Figure  2.  Some  Two  and  Three  Dimensional  Problems  Considered  . 

This  is  joint  work  with  Major  Douglas  James  who  has  written  his  Ph.D.  dissertation  with  the 
principal  investigator.  (See  publication  6.) 

•  Parallel  Algorithms  for  Dense  Linear  Algebra  Computations.  Our  purpose  in  this 
comprehensive  81  page  typeset  paper  is  to  provide  an  overall  perspective  of  parallel  algorithms  for 
dense  matrix  computations  in  linear  system  solvers,  least  squares  problems,  eigenvalue  and  singu¬ 
lar-value  problems,  as  well  as  rapid  elliptic  PDE  solvers.  Numciical  Lincai  algebra  is  a  fundamen¬ 
tal  tool  which  is  indispensable  to  scientific  and  engineering  research,  and  these  computations  are 
becoming  increasingly  dependent  upon  the  development  and  implementation  of  parallel  algorithms 
on  modem  high-performance  computers.  With  this  in  mind  we  have  attempted  in  this  paper  to 
collect  and  describe  and  to  put  into  perspective  a  selection  of  the  more  important  parallel  algorithms 
for  numerical  linear  algebra.  We  give  a  major  new  emphasis  to  certain  computational  primitives 
whose  efficient  execution  on  parallel  and  vector  computers  is  essential  in  order  to  obtain  high  per¬ 
formance  algorithms.  This  is  joint  work  with  K.  Gallivan  and  A.  Sameh  from  the  University  of 
Illinois  Center  for  Supercomputing  Research  and  Development.  (See  publications  7, 9.) 

•  Recursive  Least  Squares  on  a  Hypercube  Multiprocessor  Using  the  Covariance 
Factorization.  We  have  developed  an  efficient  parallel  implementaf"  :  of  an  algorithm  for  re- 
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cursive  least  squares  computations  based  upon  the  covariance  updating  method.  The  target  archi¬ 
tecture  is  a  distributed-memory  multiprocessor,  and  test  results  on  an  Intel  iPSC/2  hypercube 
demonstrate  the  parallel  efficiency  of  the  algorithm.  A  64-node  system  is  measured  to  execute  the 
algorithm  over  48  times  as  fast  as  a  single  processor  for  the  largest  problem  that  fits  on  a  single 
node  (fixed  size  speedup).  Moreover,  the  computation  times  increase  only  slightly  with  an  in¬ 
crease  in  the  number  of  processors  when  the  problem  size  per  processor  remains  constant. 
Applications  include  robust  regression  in  statistics  and  modification  of  the  Hessian  matrix  in  opti¬ 
mization,  but  the  primary  motivation  for  this  work  is  the  need  for  fast  recursive  least  squares  com¬ 
putations  in  signal  processing.  This  is  joint  work  with  C.  E.  Henkel.  (See  publications  8,  11.) 

•  Implicit  Nullspace  Iterative  Methods  for  Constrained  Least  Squares  Problems. 

We  propose  a  class  of  iterative  algorithms  for  solving  equality  constrained  least  squares  problems, 
generalizing  an  order-reducing  algorithm  first  analyzed  by  Barlow,  Nichols,  and  Plemmons. 
These  algorithms,  which  we  call  implicit  null  space  methods,  are  based  on  the  classical  nullspace 
method,  except  that  a  basis  for  the  nullspace  of  the  constraint  matrix  is  not  explicitly  formed.  The 
implicit  methods  allow  great  flexibility  in  the  choice  of  preconditioner,  and  are  suitable  for  parallel 
implementation  on  substructured  problems.  We  offer  some  numerical  results  for  both  structural 
engineering  applications  and  Stokes  Flow.  The  paper  is  by  Major  Douglas  James,  and  is  part  of 
his  dissertation  work  under  this  AFOSR  grant.  (See  publication  10). 

•  Order-Reducing  Conjugate  Gradients  vs  Block  AOR  for  Constrained  Least 
Squares  Problems.  We  compare  the  convergence  properties  of  two  iterative  algorithms  for 
solving  equality  constrained  least  squares  problems  of  the  form 

min  IIGy  -  ell?  such  that  Ey  =  b. 

The  first  algorithm,  due  to  Barlow,  Nichols,  and  Plemmons,  applies  a  variation  of  the  conjugate 
gradient  algorithm  to  a  symmetric  positive  definite  system  which  is  smaller  than  the  original  prob¬ 
lem.  The  second.  Block  Accelerated  Over-relaxation,  is  a  two  parameter  generalization  of  block 
SOR.  Barlow,  Nichols,  and  Plemmons  have  proven  that  their  order-reducing  conjugate  gradient 
algorithm  converges  faster  than  block  SOR.  We  extend  their  result  to  show  that  the  algorithm  is 
also  superior  to  block  AOR.  Numerical  examples  arising  in  structural  analysis  confirm  the  analy¬ 
sis.  The  paper  is  by  Ph.D  student  Major  Douglas  James,  and  is  part  of  his  dissertation  work  under 
this  AFOSR  grant.  (See  publication  12). 

•  Fast  Adaptive  Condition  Estimation.  Recursive  condition  number  estimates  of  matrices 
are  useful  in  many  areas  of  scientific  computing,  including:  recursive  least  squares  computations, 
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optimization,  eigenanalysis,  and  general  nonlinear  problems  solved  by  linearization  techniques 
where  matrix  modification  techniques  are  used.  Our  purpose  in  this  paper  is  to  propose  a  fast 
adaptive  condition  estimator,  which  we  call  ace,  for  tracking  the  condition  number  of  the  modifi«U 
matrix  over  time,  in  terms  of  its  triangular  factors,  ace  is  fast  in  the  sense  that  only  0( n)  opera¬ 
tions  are  required  for  n  parameter  problems,  and  is  adaptive  over  time,  i.e.,  estimates  at  time  t 
are  used  to  produce  estimates  at  time  t  +  1  . 


Number  of  Updates  Number  of  Updates 


Figure  3.  Performance  of  our  Adaptive  Condition  Estimator  on  Signal  Processing  Data. 


Traditional  condition  estimators  for  triangular  factors,  such  as  the  UNPACK  and  LAPACK  type 
schemes,  generally  require  <9(n2)  operations,  ace  is  in  the  spirit  of  the  popular  incremental 
condition  estimation  scheme,  ice,  developed  by  Bischof  and  used  in  LAPACK,  in  the  sense  that 
the  estimates  are  based  on  max-min  principles.  Numerical  experiments  are  reported  (see  Figure  3), 
indicating  that  the  scheme  ace  yields  an  accurate  and  robust,  yet  inexpensive,  adaptive  condition 
estimator  for  recursive  matrix  modifications.  This  is  Joint  work  with  D.  Pierce  at  the  Boeing 
Company.  (See  publication  13). 

•  Tracking  the  Condition  Number  for  RLS  in  Signal  Processing.  We  apply  a  fast 
adaptive  condition  estimation  scheme,  called  ace,  for  recursive  least  squares  (RLS)  computations 
in  signal  processing,  ace  is  fast  in  the  sense  that  only  O(n)  operations  are  required  for  n 
parameter  problems,  and  is  adaptive  over  time,  i.e.,  estimates  at  time  t  are  used  to  produce 
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estimates  at  time  t  +  1  .  RLS  algorithms  for  linear  prediction  of  time  series  are  applied  in  various 
fields  of  signal  processing:  identification,  estimation  and  control.  However,  RLS  algorithms  are 
known  to  suffer  from  numerical  instability  problems  under  finite  word  length  conditions,  due  to 
ill-conditioning.  We  develop  adaptive  procedures,  linear  in  the  order  of  the  problem,  for  accurately 
tracking  relevant  extreme  singular  values  and  associated  condition  numbers  over  time  t .  In  this 
paper  exponentially-weighted  data  windows  are  considered,  ace  is  in  the  spirit  of  an  incremental 
condition  estimation  scheme,  ice ,  proposed  by  Bischof  in  conjunction  with  orthogonal 
factorization.  Numerical  experiments  indicate  that  ace  yields  a  very  accurate,  yet  inexpensive, 
RLS  condition  estimator.  The  following  figure  illustrates  how  our  Kalman  filtering  schemes  find 
applications  in  target  tracking. 


Disturbances  Sensor  Noise 


Figure  4.  Tracking  Filter  based  on  updating  Covariance  Matrix. 
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As  a  side  benefit,  ace  also  provides  accurate  bounds  on  the  power  spectral  densities  in  the  context 
of  adaptive  filtering  in  signal  processing.  This  is  Joint  work  with  D.  Pierce  at  the  Boeing 
Company.  (See  publication  14). 

•  Block  Cyclic  SOR  for  Markov  Chains  with  p-cyclic  Infintesimal  Generator. 

The  block  SOR  method  for  the  computation  of  the  steady  state  distribution  of  finite  Markov  chains 
that  possess  p-cyclic  infinitesimal  generators  is  considered.  It  is  shown  that  convergence,  in  a 
sense  more  general  than  the  usual,  may  be  obtained,  even  if  the  SOR  iteration  matrix  violates  the 
usual  conditions  for  semiconvergence.  Necessary  and  sufficient  conditions  for  convergence  in 
this,  extended,  sense  are  derived.  They  are  then  applied  in  the  case  where  the  pth  power  of  the 
associated  Jacobi  matrix  of  the  system  to  be  solved  possesses  only  nonnegative  eigenvalues.  Exact 
convergence  intervals  and  the  optimal  to  values  are  derived  for  this  case.  In  addition  to  the 

"usual"  optimal  co  in  the  interval  (  Lp"^~  )  »  other  co  values,  that  yield  convergence  in  the 

extended  sense,  are  found  to  achieve  the  same,  optimal,  convergence  rate.  Numerical  tests  indicate 
that  small  perturbations  of  to  around  the  optimal  value  affect  the  convergence  factor  much  less,  if 
these  newly  introduced  optimal  0)  values  are  used.  Numerical  tests  fully  support  the  theory 
developed  here.  This  is  joint  work  with  Kimon  Kontovasilis  and  William  J.  Stewart.  (See 
publication  15). 

•  Adaptive  Lanczos  Methods  for  Recursive  Condition  Estimation. 

Estimates  for  the  condition  number  of  a  matrix  are  useful  in  many  areas  of  scientific  computing, 
including:  recursive  least  squares  computations,  optimization,  eigenanalysis,  and  general  nonlinear 
problems  solved  by  linearization  techniques  where  matrix  modification  techniques  are  used.  The 
purpose  of  this  paper  is  to  propose  an  adaptive  Lanczos  estimator  scheme,  which  we  c^l  ale,  for 
tracking  the  condition  number  of  the  modified  matrix  over  time.  Applications  to  recursive  least 
squares  (RLS)  computations  using  the  covariance  method  with  sliding  data  windows  are 
considered.  The  scheme  ale  is  fast  for  arbitrary  n  parameter  problems  arising  in  RLS  methods 
in  control  and  signal  processing,  and  is  adaptive  over  time,  i.e.,  estimates  at  time  t  are  used  to 
produce  estimates  at  the  next  time  step  t  +  1.  Comparisons  are  made  with  other  adaptive  and 
non-adaptive  condition  estimators  for  recursive  least  squares  problems.  Numerical  experiments  are 
reported  in  our  studies  indicate  that  ale  yields  a  very  accurate  recursive  condition  estimator.  This 
is  joint  work  with  graduate  assistant  William  Ferng  and  with  Gene  H.  Golub  at  Stanford 
University.  (See  publication  16). 
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•  Inverse  Factorization  Methods  in  Linear  Prediction. 

A  new  inverse  factorization  technique  is  presented  for  solving  linear  prediction  problems  arising  in 
signal  processing.  The  algorithm  is  similar  to  a  scheme  of  Luk  and  Qiao  in  that  it  uses  the 
rectangular  Toeplitz  structure  of  the  data  to  recursively  compute  the  prediction  error  and  to  solve 
the  problem  when  the  optimum  filter  order  has  been  found.  The  novelty  of  the  scheme  presented 
here  is  the  use  of  an  inverse  factorization  scheme  by  Pan  and  Plemmons  for  solving  the  linear 
prediction  problem  with  low  computational  complexity  and  without  the  need  for  solving  triangular 
systems.  We  also  provide  a  linear  systolic  array  for  solving  these  problems.  Extensions  of  this 
work  to  two  dimensional  signal  processing  problems  are  being  made.  Here,  one  works  with  block 
Toeplitz  matrices.  This  is  joint  work  with  graduate  assistant  James  Nagy  and  the  research  overlaps 
the  new  project:  AFOSR-91-0163.  (See  publication  17). 

•  Iterative  Lanczos-Based  Condition  Estimators  for  Linear  Systems. 

For  the  system  of  linear  equations  Ax  =  b  ,  with  a  fixed  nonsingular  matrix  A  ,  the  condition 
number  K(A)  is  important  since  it  provides  information  about  the  sensitivity  of  the  solution  to 
perturbations  in  the  data.  We  suggest  here  an  iterative  approach  to  estimating  the  condition  number 
of  A,  based  on  the  Lanczos  method.  We  call  this  scheme  ILE:  Iterative  Lanczos  Estimator.  The 
results  of  numerical  experiments  on  over  600  test  matrices  indicate  that  this  scheme  is  robust  and 
accurate.  Three  different  condition  estimators,  including  a  generalization  of  the  LINPACK 
algorithm  with  "look  behind"  strategy  suggested  by  Cline,  Conn  and  Van  Loan,  the  probabilistic 
condition  estimator  suggested  by  Higham,  and  the  incremental  condition  estimator  ICE  suggested 
by  Bischof,  are  compared  with  ILE.  Parallel  implementations  of  ILE  are  discussed  and 
computations  on  a  Cray  Y-MP  are  reported.  This  is  joint  work  with  graduate  assistant  William 
Femg  and  the  research  overlaps  the  new  project:  AFOSR-91-0163.  (See  publication  18). 
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3.  Graduate  Students 

The  following  graduate  students  are  have  worked  under  the  principal  investigator  for  this  grant. 

•  Douglas  James.  Ph.D.  August  1990.  Douglas  is  a  Major  in  the  U.S.  Air  Force.  He 
received  his  B.S.  degree  at  the  Air  Force  Academy  and  his  M.S.  degree  at  MIT  under  the  direction 
of  Professor  Gilbert  Strang,  before  enrolling  at  NCSU  to  pursue  his  Ph.D  in  mathematics  under 
the  direction  of  the  principal  investigator.  His  graduate  study  was  funded  by  the  Air  Force  Institute 
of  Technology,  WPAFB,  OH.  Major  James'  dissertation  topic  was  on  iterative  least  squares 
substructuring  methods.  Two  papers  from  his  dissertation  have  already  been  accepted  for 
publication,  and  a  third  was  given  as  an  invited  student  paper  at  a  national  SIAM  conference. 
Major  James  is  now  assigned  to  the  Air  Force  Academy. 

•  William  Ferng.  William  is  pursuing  his  Ph.D.  in  mathematics  with  a  minor  in  computer 
science  under  the  direction  of  the  principal  investigator.  He  did  his  undergraduate  work  in  engi¬ 
neering  at  Taiwan  National  University.  Mr.  Ferng  has  considerable  parallel  processing  and 
supercomputing  experience.  His  dissertation  is  in  parallel  algorithms  for  least  squares 
computations.  He  is  expected  to  graduate  under  the  principle  investigator’s  direction  in  1992. 

•  James  Nagy.  James  is  pursuing  his  Ph.D.  in  mathematics  with  a  minor  in  electrical  and 
computer  engineering  under  the  direction  of  the  principal  investigator.  He  did  his  undergraduate 
and  M.S.  work  at  Northern  Illinois  University.  His  M.S.  thesis  at  NIU  involved  a  study  of  fast 
Toeplitz  algorithms  in  sequential  estimation.  His  dissertation  topic  is  signal  and  image  processing: 
identification,  estimation  and  control.  He  is  expected  to  graduate  under  the  principle  investigator's 
direction  in  August  1991.  He  has  been  awarded  a  Post  Doctoral  position  with  the  Institute  for 
Mathematics  and  its  Applications,  University  of  Minnesota,  for  the  academic  year  1991-92. 

4.  Technical  Publications 

1 .  Least  squares  modifications  with  inverse  factorizations:  parallel  implications,  J. 
Computational  and  Appl.  Math.,  27(1989),  pp.  109-127  (with  C.  Pan).  Research  on  this 
paper  overlaps  AFOSR-83- 19500. 

2.  Least  squares  multiple  updating  algorithms  on  a  hypercube.  Inter.  J.  Parallel  and  Dist. 
Computing,  8  (1990),  pp.  80-88  (with  D.  Agrawal  and  S.  Kim). 

3.  Recursive  least  squares  on  a  vector-multiprocessor,  Proc.  International  Symp. 

MTNS-89,  Sig.  Proc.  and  Numer  Meth.,  Amsterdam  1989,  Birkhauser  Press 
Boston,  Inc.,  3(1990),  pp.  495-502. 
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4.  Optimality  relationships  for  p-cyclic  SOR,  Numer.  Math.,  56  (1990),  pp.  635-643  (with 
A.  Hadjidimos  and  D.  Pierce). 

5.  Substructuring  methods  for  computing  the  nullspace  of  equilibrium  matrices,  SIAM  J. 
Matrix  Analysis,  11  (1990),  pp.  1-22  (with  R.  White). 

6.  An  iterative  substructuring  algorithm  for  equilibrium  equations,  Numer.  Math.,  57(1990), 
pp.  625-633  (with  D.  James). 

7.  Parallel  algorithms  for  dense  linear  algebra  computations,  SIAM  Review,  32  (1990),  pp.  54- 

135  (with  K.  Gallivan  and  A.  Sameh). 

8.  Parallel  recursive  least  squares  on  a  hypercube  multiprocessor,  in  Numerical  Linear 
Algebra,  Digital  Signal  Processing  and  Parallel  Algorithms,  NATO  ASI  Series  F, 
Springer- Verlag,  Ed.  by  G.  Golub  and  P.  Van  Dooren,  (1990),  pp.  571-579. 

9.  Parallel  Algorithms  for  Matrix  Computations,  SIAM  Press,  Philadelphia  PA, 
November  (1990),  197pp.  (Joint  with  K.  Gallivan,  M.  Heath,  E.  Ng,  J.  Onega,  B. 

Peyton,  C.  Romine  and  R.  Voigt). 

10.  Implicit  nullspace  iterative  methods  for  constrained  least  squares  problems.  Invited  student 
paper  at  the  Copper  Mt.  Conf.  on  Iterative  Methods,  Copper  Mt„  CO  (1990),  submitted  to  the 
SIAM  J.  Sci.  Stat.  Comp,  (by  Ph.D.  student  Major  Douglas  James,  and  part  of  his 
dissertation  work  supported  under  this  AFOSR  grant.) 

1 1 .  Recursive  least  squares  on  a  hypercube  multiprocessor  using  the  covariance  factorization, 
SIAM  J.  Sci.  Stat.  Comp.,  12(1991),  pp.  95-106  (with  C.  Henkel). 

12.  Order-reducing  conjugate  gradients  vs  block  AORfor  constrained  least  squares  problems,  to 
appear  in  Lin.  Alg.  and  It's  Applications,  (1991)  (by  Ph.D  student  Major  Douglas 
James,  and  part  of  his  dissertation  work  supported  under  this  AFOSR  grant). 

13.  Fast  adaptive  condition  estimation,  to  appear  in  SIAM  J.  Matrix  Anal.  (1991)  (with  D. 
Pierce). 

14.  Tracking  the  condition  number  for  RLS  in  signal  processing,  to  appear  in  Mathematics  of 
Control,  Signals  and  Systems,  (1991)  (with  D.  Pierce). 

15.  Block  cyclic  SOR  for  Markov  chains  with  p-cyclic  infintesimal  generator,  Lin.  Alg. 

Applic.  Special  Issue  on  Iterative  Methods,  to  appear  (1991)  (with  K.  Kontovasilis  and  W.  J. 
Stewart). 

16.  Adaptive  Lanczos  methods  for  recursive  condition  estimation,  to  appear  in  Numerical 
Algorithms,  (1991)  (with  William  R.  Femg  and  Gene  H.  Golub). 

17.  An  inverse  factorization  algorithm  for  linear  prediction,  preprint,  (1991)  (with  J.  Nagy). 
Research  on  this  paper  overlaps  the  new  project:  AFOSR-91-0163. 

18.  Iterative  Lanczos-based  condition  estimators,  preprint,  (1991)  (with  W.  Femg).  Research  on 
this  paper  overlaps  the  new  project:  AFOSR-91-0163. 
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5.  Invited  Research  Presentations 

•  Recursive  Least  Squares  on  a  Hypercube  Multiprocessor,  NATO  Workshop  on  Num. 

Lin.  Alg.,  Signal  Proc.  and  Parallel  Algorithms,  Leuven,  Belgium,  August  1988. 

•  Least  Squares  Computations  in  Signal  Processing,  University  of  Illinois-Urbana, 
Colloquium,  October  1988. 

•  Parallel  Algorithms  for  Recursive  Least  Squares  Computations,  University  of 
Pittsburgh  and  Carnegie-Mellon,  Colloquium,  December  1988. 

•  An  Iterative  Substructuring  Algorithm  for  Equilibrium  Equations,  R.  S.  Varga  Conf.  on 
Approx.  Theory  and  Numerical  Linear  Algebra,  Kent  OH,  March  1989. 

•  Recursive  Least  Squares  Computations,  International  Conf.  on  Mathematics  in 
Networks  and  Systems,  Amsterdam  ,  June  1989. 

•  Modified  Least  Squares  Methods,  Minisymp.  -  SIAM  Conf.,  San  Diego  CA,  July  1989. 

•  Fast  RLS  Methiods  in  Signal  Processing,  Boeing  Aircraft  Company,  Seattle  WA, 
Colloquium,  August  1989. 

•  Least  Squares  and  Related  Computations  on  High  Performance  Architectures,  Wake  Forest 
University,  Colloquium,  January  1990. 

•  Panel  Presentation  on  Scientific  Computing,  ACM  -  IEEE  Supercomputing  '89,  Reno 
NV,  November  1989. 

•  Department  Heads  Presentation:  What  is  Scientific  Computing?,  ACM  Annual 
Computer  Science  Conference,  Washington,  DC,  February  1990. 

•  Fast  Adaptive  Condition  Estimation  in  Signal  Processing,  Householder  XI  Symposium, 
Halmstad,  Sweden,  June  1990. 

•  Congugate  Gradient  Methods  for  Least  Squares  Computations,  Symposium  at  the  USSR 
Academy  of  Sciences,  Moscow,  Russia,  July  1990. 

•  Software  Needs  in  Signal  Processing,  DARPA  Workshop  on  Science  and  Technology 
for  the  90's,  Oak  Ridge,  TN,  September  1990. 

•  Comparison  of  Adaptive  Condition  Estimators  in  Signal  Processing  and  Control,  SIAM 
Conf.  on  Lin.  Alg.  in  Signals,  Systems  and  Control,  San  Francisco,  CA, 
November  1990. 


AFOSR  Final  Report 


unclassified 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


REPORT  D 


la.  REPORT  SECURITY  CLASSIFICATION  ,jNCLAssiFIED 


2a.  SECURITY  CLASSIFICATION  AUTHORITY 


CTr€?n»Tgn?rcr;Ta8ii 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 


REPORT  DOCUMENTATION  PAGE 


lb.  RESTRICTIVE  MARKINGS 


3.  DISTRIBUTION  /  AVAILABILITY  OF  REPORT 

Approved  for  public  release;  distribution  unlimited 


5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6a.  NAME  OF  PERFORMING  ORGANIZATION 

North  Carolina  State  University 


6b.  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 

(if  applicable )  Air  Force  Office  of  Scientific  Research 


7b.  ADDRESS  (City,  State  and  ZIP  Code) 

Mathematical  &  Information  Sciences 
Bolling  AFB  DC  20332-6448 


6c.  ADDRESS  (City,  State  and  ZIP  Code) 

Depts.  of  Mathematics  and  Computer  Science 
NC  State  University,  Raleigh,  NC  27695-8205 


8a.  NAME  OF  FUNDING  /SPONSORING  ORG.  8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

(If  applicable) 

AFOSR  NM  AFOSR-88-0285 


10.  SOURCE  OF  FUNDING  NUMBERS 


PROJECT  NO. 


EH 

Bf 

AFOSR 


8c.  ADDRESS  (City,  State  and  ZIP  Code) 

Bolling  AFB,  DC  20332-6448 


11.  Title  (Include  Security  Classification) 

Parallel  Algorithms  for  Least  Squares  and  Related  Computations 


12.  PERSONAL  AUTHOR(S) 

Dr.  Robert  J.  Plemmons 


TASK  NO. 


WORK  UNIT 
ACCESSION  NO. 


13a.  TYPE  OF  REPORT 
Final 


16.  SUPPLEMENTARY  NOTATION 


13b.  TIME  COVERED 
FROM  8/1/88  TO  1/31/91 


14.  DATE  OF  REPORT  ( Year,  Month,  Day)  1 5.  PAGE  COUNT 

3/22/91 


_LL _ . _ COSATI  CODES _  1 8.  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

FIELD  I  GROUP  |  SUB-GROUP  . _ .  „ _  _ ..  i; _ .  .  „  _ , 


Least  squares,  numerical  linear  algebra,  parallel 
algorithms,  signal  processing,  structural  analysis 


19.  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

This  final  report  summarizes  the  activities  in  support  of  the  Air  Force  research  project, 
AFOSR-88-0285,  and  identifies  the  important  accomplishments  over  the  period  of  the  grant. 

Our  research  project  has  been  concerned  with  the  design  and  testing  of  new  algorithms  for 
least  squares  computations  with  particular  emphasis  on  applications  to  signal  processing  and 
to  optimization  methods  in  structural  analysis,  as  well  as  to  related  problems  in  science  and 
engineering.  The  objectives  were  to  mathematically  develop,  test,  and  analyze  fast  numerical 
algorithms  for  the  efficient  solution  to  computational  problems  on  modern  high  performance 
computers.  Our  recent  work  on  fast  recursive  least  squares  computations  in  signal  and  image 
processing  and  computational  methods  in  structural  analysis  have  led  to  especially  significant 
results.  Some  highlights  of  these  results  are  outlined  in  this  report. 


20.  DISTRIBUTION  /  AVAILABILITY  OF  ABSTRACT 
□  UNCLASSIFIED/UNLIMITED  EJ  SAME  AS  RPT  □  OTIC  USERS 


22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Dr.  Neal  D.  Glassman 


21 .  ABSTRACT  SECURITY  CLASSIFICATION 

Unclassified 


22b.  TELEPHONE  (Include  Area  Code)  22c.  OFFICE  SYMBOL 

(202)  767-5028  NM 


[Copied  from  DD  FORM  1473  (3-84)] 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


unclassified 


